Running head: INTELLIGIBILITY OF MODIFIED SPEECH The Intelligibility of Modified Speech for Young Listeners with Normal and Impaired Hearing
نویسندگان
چکیده
Exposure to modified speech has been shown to benefit language-learning impaired children with respect to their language skills (Tallal et al., 1996; Merzenich et al., 1998). In the study by Tallal and colleagues, the speech modification consisted of both slowing down and amplifying fast, transitional elements of speech. In this study, we examined whether the benefits of modified speech could be extended to provide intelligibility improvements for children with severe-to-profound hearing impairment who wear sensory aids. In addition, the separate effects on intelligibility of slowing down and amplifying were evaluated. Two groups of listeners were employed: eight severe-to-profoundly hearing-impaired children and five children with normal hearing. Four speech-processing conditions were tested: 1) natural, unprocessed speech; 2) envelope-amplified speech; 3) slowed speech; and 4) both slowed and envelope-amplified speech. For each condition, three types of speech materials were used: words in sentences, isolated words, and syllable-contrasts. To degrade the performance of the normal-hearing children, all testing was completed with a noise background. Results from the hearing-impaired children showed that all varieties of modified speech yielded either equivalent or poorer intelligibility than unprocessed speech. For words in sentences and isolated words, the slowing-down of speech had no effect on intelligibility scores while envelope-amplification, both alone and combined with slowing-down, yielded significantly lower scores. Intelligibility results from normal-hearing children listening in noise were somewhat similar to those from hearing-impaired children. For isolated words, the slowing-down of speech had no effect on intelligibility while envelope-amplification degraded intelligibility. For both subject groups, speech processing had no statistically significant effect on syllable discrimination. In summary, without extensive exposure to the speech processing conditions, children with INTELLIGIBILITY OF MODIFIED SPEECH 3 impaired hearing and children with normal hearing listening in noise received no intelligibility advantage from either slowed speech or envelope-amplified speech. INTELLIGIBILITY OF MODIFIED SPEECH 4 The Intelligibility of Modified Speech for Young Listeners with Normal and Impaired Hearing At oral schools for the deaf, such as Central Institute for the Deaf (CID), there is an obvious, critical need for speech to be delivered more intelligibly to children with impaired hearing. Even with the most advanced hearing aids or cochlear implants, severely and profoundly hearing impaired children often have great difficulty perceiving speech (Fryauf-Bertschy, Tyler, Kelsay, Gantz & Woodworth, 1997). Such children with severe and profound hearing impairment may perceive correctly only 26% of the words presented to them (Kirk, Pisoni & Osberger, 1995). Our primary interest in studying a speech modification, one introduced recently by Tallal et al. (1996) for language-learning impaired (LLI) children, stems from this critical need to make speech more intelligible for severe-to-profoundly hearing-impaired children. Intensive exposure to modified speech, as introduced by Tallal et al. (1996), is a major component of their multi-week training program designed for LLI children. Merzenich et al. (1998) and Tallal et al. (1996) report that this multi-week training program provides substantial benefit to LLI children with respect to their language scores. In the earlier study, Tallal et al. (1996) employed two groups of LLI children: 1) a test group which received the four-week intensive training program with modified speech; and 2) a control group which received the fourweek intensive training program with unmodified speech. During this intensive program, speech occurred in the context of several computer-based training exercises. After completion of the program, both the control and test groups showed improvements in speech and language scores. However, the test group, which was exposed to modified speech, achieved significantly greater gains than the control group, which was exposed to unmodified speech. Thus, it appears that the INTELLIGIBILITY OF MODIFIED SPEECH 5 speech modification of Tallal et al. contributes to the observed language benefit beyond that which is achieved solely from the computer-based exercises. The type of speech modification employed by Tallal et al. is motivated by the hypothesis that LLI children “have a ‘temporal processing deficit’ expressed by limited abilities at identifying some brief phonetic elements presented in specific speech contexts and by poor performances at identifying or sequencing short-duration acoustic stimuli presented in rapid succession” (Merzenich et al., 1996, p.77). Their speech modification addresses this deficit in two ways. First, speech is uniformly slowed down by as much as 50% and second, fast-varying elements of the speech signal are amplified. The latter modification is referred to here as envelope amplification (Nagarajan et al., 1998). Both components of the modification are designed to enhance temporally-short, time-varying speech elements, such as the formant transitions from a consonant to a vowel. Though the speech modification employed by Tallal et al. was intended for LLI children, there are good reasons for exploring the effects of this speech modification for a different population, namely severe-to-profoundly hearing-impaired children (with no other handicapping condition) wearing sensory aids. As mentioned previously, children with severe and profound hearing impairments do not perceive speech well even when aided. Hence, any type of speech processing that might conceivably enhance the intelligibility of speech to these listeners should be explored for potential benefit. Additionally, for hearing-impaired children, better speech perception scores are often associated with better spoken language skills (Boothroyd, Geers & Moog, 1991; Geers & Moog, 1992). So, a benefit in speech perception could also have a positive impact on the language skills of hearing-impaired children. INTELLIGIBILITY OF MODIFIED SPEECH 6 The envelope-amplification component of Tallal’s speech modification, in particular, appears promising for its potential to improve speech perception. Recently, Hazan and Simpson (1998) reported that explicit amplification of consonants and their subsequent formant transitions improved speech intelligibility in noise for listeners with normal hearing. Thus, if the envelopeamplification described by Nagarajan et al. (1998) does indeed amplify formant transitions while not introducing concomitant degradations, we might expect envelope-amplification to improve the intelligibility of speech for hearing-impaired listeners. Also, little is known about the effects of envelope-amplification on speech perception. This specific type of processing has not been studied and is not comparable to other types of processing, such as amplitude compression, that have been examined extensively (e.g., Moore, Peters & Stone, 1999; Plomp, 1988). Over the years, the effects of time-expanded speech on intelligibility have been explored in young normal-hearing listeners (Schon, 1970; Korabic, Freeman & Church, 1978), in aged normalhearing listeners (Schon, 1970; Schmitt, 1983; Gordon-Salant, 1986), in hearing-impaired listeners (Picheny, Durlach & Braida, 1989; Uchanski, Choi, Braida, Reed & Durlach, 1996), and in language-impaired or dyslexic listeners (Stollman, Kapteyn & Sleeswijk, 1994; McAnally, Hansen, Cornelissen & Stein, 1997). Despite many differences amongst these studies (such as the language used, speech materials, listener characteristics, and the amount of time-expansion) there is general agreement that time-expansion does not significantly affect speech intelligibility. That is, time-expansion (by 50% and more) neither degrades nor improves speech intelligibility. The only studies that showed an improvement in intelligibility for a time-expanded speech signal were those that examined naturally produced clear speech. In these studies, an intelligibility advantage was found for naturally produced clear speech relative to conversational speech, for hearing-impaired adults and for normal-hearing listeners in noise (Picheny, Durlach & Braida, INTELLIGIBILITY OF MODIFIED SPEECH 7 1985; Payton, Uchanski & Braida, 1994; Uchanski et al., 1996). While clear speech is generally produced at a slower speaking rate (approximately 90-100 wpm for clear as compared to 160-200 wpm for conversational speech), there is growing evidence that clear speech is not equivalent to either naturally-produced slow speech or artificially time-expanded conversational speech. All these types of speech (natural clear, natural slow, artificially time-expanded) differ significantly in intelligibility and in many acoustic properties other than duration (Moore & Zue, 1985; Picheny, Durlach & Braida, 1986; Moon & Lindblom, 1994; Krause, 1995; Fosler-Lussier & Morgan, 1999). The effect of time-expansion on the ability to discriminate speech sounds is somewhat different from its effect on speech intelligibility or identification. For example, for listeners with normal hearing discriminating sounds in a [ba]-[da] continuum, Sussman and Carney (1989) found no effect of transition duration for 7to 8-year-old children and a significant effect of transition duration for adults, 5to 6-year-old children, and 9to 10-year-old children. For children with language disabilities, slowing down formant transitions consistently improves discrimination between synthetic speech sounds (Alexander & Frost, 1982; Tallal & Piercy, 1975) and seems to enhance the neural representations of synthetic /da/’s and /ga/’s (Bradlow et al., 1999). We hypothesized that modified speech, with its presumably more salient speech sounds for LLI children, might be more intelligible than unmodified (unprocessed) speech for children with impaired hearing who wear hearing aids and/or cochlear implants. To test this hypothesis we examined the intelligibility of modified speech for children with impaired hearing. The speech modification applied by Tallal et al. (1996), known to be beneficial for training with LLI children, included both envelope-amplification and time-expansion, and thus it should preferably be evaluated as such. On the other hand, as discussed above, previous research with hearing-impaired INTELLIGIBILITY OF MODIFIED SPEECH 8 persons indicated that time-expansion alone was unlikely to increase intelligibility, whereas envelope-amplification might be more successful. Because it is not possible to predict the effect of time-expansion in combination with envelope-amplification, we chose to evaluate all possible modification conditions. That is, for this study the two speech modification components, timeexpansion and envelope-amplification, are evaluated separately for their effects on speech intelligibility. Additionally, there is a practical reason for determining these separate effects. A real-time implementation of envelope-amplification would preserve the natural synchrony between the visual and auditory signals that is critical for speechreading by hearing-impaired individuals. By contrast, time-expansion would destroy this natural synchrony between the visual and auditory speech signals. Besides examining the effect of time-expansion and envelope-amplification on intelligibility, the effects of these modifications on speech discrimination were also examined. We chose to include a speech-discrimination task because of the promising results from studies of time-expansion on synthetic speech discrimination, and because it is possible for a speech modification to improve the perceptual discrimination of speech sounds without improving overall intelligibility. Thus, inclusion of this task allows another opportunity for uncovering a potential perceptual benefit from any of the speech modifications. While the primary goal of this study was to determine the intelligibility benefit of modified speech for children with hearing-impairment, a group of children with normal hearing were also tested. Tests with hearing-impaired children allowed us to assess the potential benefit of modified speech on intelligibility directly for this population of interest. However, tests with normal-hearing children (listening in noise to eliminate ceiling effects in performance) allowed us to assess the general effect of modified speech on intelligibility, for children with normal auditory processing INTELLIGIBILITY OF MODIFIED SPEECH 9 skills. Also, speech presented to normal-hearing listeners will be affected only by the signal processing of the Tallal speech modification whereas speech presented to hearing-impaired listeners will be affected by the signal processing in the speech modification and by the signal processing (such as a compression algorithm) in the listener’s prosthetic hearing device. Consequently, speech perception results from hearing-impaired listeners might be confounded by an interaction between the two types of signal processing while speech perception results from normal-hearing listeners will not. Method Participants Two groups of children participated in this study. For the first group, children with bilateral, sensorineural hearing impairment were recruited from the CID school. All children at CID’s school who achieved a minimum score of 5 years on the receptive portion of the Peabody Picture Vocabulary Test (Dunn & Dunn, 1981) were recruited. Receptive vocabulary (PPVT) was used as the primary selection criterion to ensure that participants possessed a vocabulary level appropriate for the speech materials employed in the experiments. A total of eight children with impaired hearing agreed to participate in the study. In addition, the non-verbal cognitive function of these children was tested and determined to be in the normal range for their chronological age. Table 1 lists characteristics of these children such as pure-tone-average, type of hearing device(s) used, age, and PPVT score. As shown in Table 1, a variety of losses, devices, ages, and equivalent receptive language-ages (as based on PPVT) are represented in this group. Six children have profound hearing loss (hi1-hi5, hi7), one child has a moderate-to-severe loss (hi6), and one has normal hearing below 500 Hz with a sloping-to-moderate loss at 1000-8000 Hz (hi8). Four children wear cochlear implants, three wear hearing aids, and one child wears both a hearing aid INTELLIGIBILITY OF MODIFIED SPEECH 10 and cochlear implant (hi7). Four of the five cochlear implants were programmed with the SPEAK processing strategy while one employed the MPEAK processing strategy (hi4). The hearing aids worn by these listeners also varied. The hearing aids worn by two subjects (hi7, hi8) used linear amplification with peak-clipping while the aids worn by others employed wide-band amplitude compression (hi5, hi6). The second group consisted of five children with normal hearing. These participants were recruited from parents on the CID staff. They ranged in age from 7 to 11 years old, spoke English as their native language, and had normal hearing. Though the PPVT was not employed for the children with normal hearing, there were no known language impairments. On average, the normal-hearing group was younger (mean age: 9 years 5 months) than the hearing-impaired group (mean age: 12 years 4 months), but presumably had a higher mean PPVT age. Despite the difference in chronological age for the two groups of listeners, there was considerable overlap between the range of PPVT ages for the hearing-impaired group and the range of chronological ages for the normal-hearing group. Speech-Processing Conditions Four speech processing conditions were examined. These were: (1) original, unmodified speech (U); (2) speech that was uniformly slowed-down or time-expanded by 50% (T); (3) speech modified by 20-dB amplification of time-frequency regions where the critical-band filtered spectral envelope contained energy in the 3-30 Hz range, i.e., amplification of the fast, transitional elements of speech (A); and (4) speech that was time-expanded and had its fast-varying elements amplified (TA). New recordings of all the speech materials for this study were made by one male talker. This male talker was an experienced speaker, had made recordings for others (including Cochlear INTELLIGIBILITY OF MODIFIED SPEECH 11 Corporation), and had a typical male fundamental frequency, F0 (mean F0 ~ 110 Hz). These recordings served as the unmodified (or unprocessed) speech materials. Speech for the remaining three conditions was processed at Scientific Learning Corporation using the same algorithms employed in their Fast ForWord training program. Below is a very brief description of the processing algorithms used in the T, A, and TA conditions. A detailed description is given in Nagarajan et al. (1998). Time-expansion (the T condition) is achieved via a digital signal processing algorithm developed by Portnoff (1981). This algorithm involves computation of the short-time Fourier transform, followed by linear interpolation and phase-modification to a new time-scale, and finally computation of the inverse Fourier transform to yield a time-expanded signal. The time-expansion algorithm is applied uniformly throughout the signal such that all speech segments (formant transitions, steady-state vowels and fricatives, silence gaps, etc.) are lengthened by 50%. For example, a 50-ms formant transition and an 80-ms fricative would become 75-ms and 120-ms in duration, respectively. Envelope amplification (the A condition) is accomplished by an overlap-add procedure. Envelope signals from the equivalent of 22 criticalband-like band-pass filters are found by combining the absolute value of the short-time Fourier transform across the appropriate frequencies for each band signal. These 22 envelope signals are then band-pass filtered (3-30 Hz) and added back to the original envelope signals “to amplify fastelements while retaining the slower modulations in their original forms” (Nagarajan et al., 1998, p. 261). In addition, a fixed gain is applied to the envelope signals such that the frequency region from roughly 1000-3200 Hz (usually associated with F2) is amplified by 20 dB. Finally, the entire envelope-modified time signal is obtained by summing the short-time Fourier transforms using a weighted overlap-add procedure. Both the time-expansion (T) and envelope-amplification (A) algorithms were applied to entire original speech signals without further intervention. That is, no INTELLIGIBILITY OF MODIFIED SPEECH 12 phonetic labels or time-markings were used, and no explicit formant manipulations were made. Sample time-waveform and spectrogram displays of the word “bus” are shown in Figure 1 for each of the four speech processing conditions. Speech Materials A range of speech materials were selected, from word identification in sentence contexts to CV-syllable contrasts. For each speech-processing condition, the following were employed. First, two lists of revised Bamford-Kowal-Bench (BKB) sentences, consisting of 100 keywords total, were used (Bamford & Wilson, 1979). Second, one list of the Word Intelligibility by Picture Identification (WIPI) test, consisting of 25 words total, was used (Ross & Lerman, 1971). These particular test materials were chosen because they contain vocabulary and syntax appropriate for young children with hearing impairment. Third, eight consonant-vowel (CV) syllables (/da/, /ga/, /ta/, /ti/, /tu/, /sa/, /!a/, /za/) were used with the VIDSPAC program (Boothroyd, 1997). Each CVsyllable was represented by three distinct tokens or utterances. These eight CV-syllables were paired to form eight contrasts (/da/ /ta/, /sa/ /za/, /da/ /za/, /sa/ /ta/, /da/ /ga/, /sa/ /∫a/, /ti/ /tu/, and /ti/ /ta), and two contrasts each of the consonant features voicing, manner, and place, as well as two contrasts for vowel identity (height and place). Presentation and Equipment The tests were performed inside an IAC sound-isolated booth. The test examiner sat in the IAC booth with the child. For all conditions and for both groups, speech was presented in the freefield using an Anchor AN-100 audio speaker. Free-field presentation, used for all children, was chosen to avoid feedback problems that might occur with the use of headphones on children wearing hearing aids or cochlear implants. Testing was completed in four half-hour sessions for INTELLIGIBILITY OF MODIFIED SPEECH 13 the children with hearing impairment and two one-hour sessions for the children with normal hearing. All speech was stored digitally with a sampling rate of 22.05 kHz, and each speech waveform was normalized to the same total rms level. The rms-normalization was performed digitally using a custom-written LabView (National Instruments) program. The rms normalization level was chosen such that no digital waveform was clipped when scaled in amplitude. A oneoctave band of noise centered at 1 kHz was generated with the same rms level for use as a calibration signal. The sound level at the location of the subject’s head for this calibration signal was approximately 74 dBA, measured using a Brüel & Kjær sound-level meter equipped with a #4165 free-field microphone. For unprocessed speech, vowel-peaks in sentences correspond to levels roughly 6 to 12 dB higher than the total rms for a sentence. A background noise was used for the children with normal hearing. This noise was created to prevent ceiling effects in the scores from this group. The level and spectral shape of this noise was designed to produce elevated audibility thresholds similar to those found for a moderate hearing loss (e.g., those of subject hi8). The noise was generated digitally by filtering white noise through a bank of twenty 1/3-octave, 4th-order Butterworth filters. The spectrum of the background noise is shown in Figure 2. The overall speech-to-noise ratio (SNR) for the children with normal hearing was roughly -4 dB. The background noise was gated on (and off) 50 ms before (and after) the start (and end) of each speech stimulus. For the BKB and WIPI materials, audio presentation of the speech stimuli was controlled via custom-written LabView programs. Both the sentence and isolated-word speech tests were self-paced, giving the children ample time to respond, and were executed without feedback to the listener. For the BKB sentences, participants were instructed to repeat the sentence that was INTELLIGIBILITY OF MODIFIED SPEECH 14 presented auditorily. Children responded verbally and were encouraged to repeat any word or words they heard. Each sentence was presented only once to each listener, for a total of eight BKB lists (2 BKB lists/child/condition = 32 sentences/child/condition = 100 keywords/child/condition). Responses were generally scored in real-time by the examiner, and were recorded on audiotape for examination at a later time, as needed. Since the equivalent language age for many of the children with impaired hearing was around 5-6 years, and children of that age often make errors in nounverb agreement, verb tense, etc., the responses to the sentences were scored somewhat liberally. For consistency, this scoring method was applied to all children. A word was scored correct if the root word was perceived correctly. Incorrect word endings, such as “-s” for plurals or “-ed” for verb tense, were ignored. For the WIPI test, participants were instructed to point to the picture associated with the word that was presented auditorily. The WIPI picture foils were digitally scanned so that responses could be tabulated automatically via a screen-touch or mouse-click. One WIPI list was used per condition (1 list/child/condition = 25 words/child/condition). The CV-syllable materials were used in a discrimination task that assessed a listener’s ability to hear differences between speech sounds. Syllables were presented via the computergame-like VIDSPAC program. The VIDSPAC program presents pairs of speech stimuli in a standard-deviant paradigm, in which the standard is presented a random number of times (we chose a uniform distribution between 2 and 5) before the deviant is presented. The listener is instructed to respond when a different syllable sound is heard. For example, for the pair /da/ /ga/, the first syllable, /da/, is considered the standard and /ga/, the deviant sound. The syllable /da/ might be presented 4 times before /ga/ is presented in the 5 interval. If the listener hears the 5 interval (/ga/) as a sound different from the previous four sounds (in this case, the standard sound INTELLIGIBILITY OF MODIFIED SPEECH 15 /da/), then the child responds by touching the screen on a designated image or by pressing the spacebar on the keyboard. The listener, in this case, would be given credit for one correct response to the deviant sound. Two types of incorrect response or errors were possible for each “trial” or sequence of standard-deviant sounds. First, if the listener did not detect the deviant sound, i.e., did not make a response when the deviant was presented, then an error of omission was recorded (this reduces the number of “hits”). This type of error is analogous to a “miss” in signal detection theory (Green & Swets, 1974). Second, if the listener incorrectly responds (e.g., by pressing the spacebar) to one of the standard presentations thinking it sounded different from the previous standard presentations, then the VIDSPAC program would record this error as a false positive. This second type of error is analogous to a “false alarm” in signal detection theory (Green & Swets, 1974). The inter-stimulus interval was 1.5 s, and correct/incorrect feedback was provided implicitly through the actions of a cartoon character in the computer-game. Four standard-deviant trials were presented for each CV-syllable pair for each condition. For each presentation interval (standard or deviant), one of the three tokens for each syllable was chosen randomly. Thus, the listener was prevented from responding to either utterance-specific suprasegmental cues (e.g., syllable duration and F0) or non-phonetic artifacts. VIDSPAC tests were scored automatically by the VIDSPAC computer program. In each 1⁄2-hour session, each hearing-impaired child was randomly assigned (without replacement) two BKB lists, one WIPI list and one CV-list, with the signal processing condition also randomly assigned (without replacement) to each list. Their order of presentation varied randomly from subject to subject. For the children with normal hearing, two equivalent 1⁄2-h sessions were combined into one 1-h session.
منابع مشابه
Speech Intelligibility of Cochlear-Implanted and Normal-Hearing Children
Introduction: Speech intelligibility, the ability to be understood verbally by listeners, is the gold standard for assessing the effectiveness of cochlear implantation. Thus, the goal of this study was to compare the speech intelligibility between normal-hearing and cochlear-implanted children using the Persian intelligibility test. Materials and Methods: Twenty-six cochlear-implanted childre...
متن کاملIntelligibility of modified speech for young listeners with normal and impaired hearing.
Exposure to modified speech has been shown to benefit children with language-learning impairments with respect to their language skills (M. M. Merzenich et al., 1998; P. Tallal et al., 1996). In the study by Tallal and colleagues, the speech modification consisted of both slowing down and amplifying fast, transitional elements of speech. In this study, we examined whether the benefits of modifi...
متن کاملبررسی وضوح گفتار کودکان فلج مغزی اسپاستیک 8 تا 12 ساله
Background and purpose: Speech intelligibility refers to how speech is understandable by listeners. This study examined speech intelligibility in children (Persian native speakers) with spastic cerebral palsy aged 8-12 years old. Materials and methods: A cross-sectional study was performed in 31dysarthric students (….. boys and …..girls) in Tehran, 2014. A list of w...
متن کاملSpeech intelligibility after repair of cleft lip and palate
Background: Intelligibility refers to understandability of speech; and lack of it can negatively affect children’s overall communication effectiveness. Children with repaired cleft lip and/or cleft palate (CL/P) may experience poor speech intelligibility. This study aimed at evaluating speech intelligibility in children with repaired CL/P who had not been referred to sp...
متن کاملمقایسه وضوح گفتار کودکان کاشت حلزون شده، دارای سمعک و کودکان با شنوایی هنجار
Objective: The purpose of the present research was to compare speech intelligibility in children with cochlear implant, with hearing aids and normal hearing in Tehran province. Materials & Methods: Sixty children underwent this analytic and comparative research. They were divided into three groups and each group contains 20 children. First and second group were selected, ordinarily, from ch...
متن کاملComparison of objective and subjective measures of speech intelligibility in elderly hearing-impaired listeners.
Three experiments were performed to evaluate the use of subjective intelligibility estimations as a method for measuring hearing aid benefit. Subjective and objective speech intelligibility scores were compared for young normal-hearing and elderly hearing-impaired listeners. Objective intelligibility scores were obtained using the Connected Speech Test (CST). This test consists of conversationa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002